智能论文笔记

Mining Adverse Drug Reactions from Unstructured Mediums at Scale

Hasham Ul Haq Veysel Kocaman David Talby

分类：自然语言处理 | 人工智能 | 机器学习

2022-01-05

不良药物反应/事件（ADR / ADE）对患者健康和医疗费用产生重大影响。尽早检测ADR并与监管机构，制药公司和医疗保健提供者分享他们可以防止发病率并挽救许多生命。虽然大多数ADR都没有通过正式渠道报告，但它们通常在各种非结构化对话中记录，例如患者的社交媒体帖子，客户支持调用记录人或医疗保健提供者和制药商销售代表之间的会议注意事项。在本文中，我们提出了一种自然语言处理（NLP）解决方案，可在这种非结构化的自由文本对话中检测ADR，这在三种方面提高了先前的工作。首先，新的命名实体识别（NER）模型为ADR，CADEC和SMM4H基准数据集（分别为91.75％，78.76％和83.41％F1分数）获得新的最新的准确性）。其次，介绍了两个新的关系提取（RE）模型 - 基于Biobert，而另一个利用完全连接的神经网络（FCNN）的制作功能 - 显示与现有最先进的模型相提并论，在用补充诊所注释的RE DataSet培训时擅长它们。三是新的文本分类模型，用于决定对话是否包括ADR，在CADEC数据集中获得新的最先进的准确性（86.69％F1分数）。完整的解决方案在Apache Spark的顶部构建的生产级文库中实施了完整的解决方案，使其本身可扩展，并能够处理商品集群上的数百万批次或流媒体记录。

translated by 谷歌翻译

Deeper Clinical Document Understanding Using Relation Extraction

Hasham Ul Haq , Veysel Kocaman , David Talby

分类：自然语言处理 | 人工智能 | 机器学习

2021-12-25

生物医学文献和数字临床记录的汹涌数量呈现不断涌入的需要，这些技术不仅可以识别而且还可以在语义上与非结构化数据中的实体相关联。在本文中，我们提出了一种文本挖掘框架，包括命名实体识别（ner）和关系提取（RE）模型，其在以前的三种主要方面扩展了先前的工作。首先，我们介绍了两个新的RE模型架构 - 基于Biobert的精确优化的架构，并在完全连接的神经网络（FCNN）上使用制成特征的速度优化。其次，我们在2012年I2B2临床时间关系挑战（F1为73.6，+ 1.2％，在前面的SOTA的临床时间关系挑战上获得新的最先进的F1分数，从而在公共基准数据集上获得新的最先进的F1分数，2010年I2B2临床关系挑战（69.1，+ 1.2％），2019年表型 - 基因关系数据集（F1为87.9，+ 8.5％），2012年不利药物事件药物反应数据集（F1为90.0，+ 6.3％）和2018年N2C2病理学关系数据集（F1为96.7，+ 0.6％）。第三，我们展示了这一框架的两个实际应用 - 用于建立生物医学知识图，并提高临床码映射实体的准确性。该系统采用Spark NLP库构建，该库提供生产级，本地可扩展，硬件优化，可训练和可调NLP框架。

translated by 谷歌翻译

Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems

Fitash Ul Haq , Donghwan Shin , Lionel Briand

分类：机器学习

2022-10-27

Deep Neural Networks (DNNs) have been widely used to perform real-world tasks in cyber-physical systems such as Autonomous Driving Systems (ADS). Ensuring the correct behavior of such DNN-Enabled Systems (DES) is a crucial topic. Online testing is one of the promising modes for testing such systems with their application environments (simulated or real) in a closed loop taking into account the continuous interaction between the systems and their environments. However, the environmental variables (e.g., lighting conditions) that might change during the systems' operation in the real world, causing the DES to violate requirements (safety, functional), are often kept constant during the execution of an online test scenario due to the two major challenges: (1) the space of all possible scenarios to explore would become even larger if they changed and (2) there are typically many requirements to test simultaneously. In this paper, we present MORLOT (Many-Objective Reinforcement Learning for Online Testing), a novel online testing approach to address these challenges by combining Reinforcement Learning (RL) and many-objective search. MORLOT leverages RL to incrementally generate sequences of environmental changes while relying on many-objective search to determine the changes so that they are more likely to achieve any of the uncovered objectives. We empirically evaluate MORLOT using CARLA, a high-fidelity simulator widely used for autonomous driving research, integrated with Transfuser, a DNN-enabled ADS for end-to-end driving. The evaluation results show that MORLOT is significantly more effective and efficient than alternatives with a large effect size. In other words, MORLOT is a good option to test DES with dynamically changing environments while accounting for multiple safety requirements.

translated by 谷歌翻译

Traffic Management of Autonomous Vehicles using Policy Based Deep Reinforcement Learning and Intelligent Routing

Anum Mushtaq , Irfan ul Haq , Muhammad Azeem Sarwar , Asifullah Khan , Omair Shafiq

分类：机器学习 | 人工智能

2022-06-28

深度强化学习（DRL）使用多样化的非结构化数据，并使RL能够在高维环境中学习复杂的策略。基于自动驾驶汽车（AVS）的智能运输系统（ITS）为基于政策的DRL提供了绝佳的操场。深度学习体系结构解决了传统算法的计算挑战，同时帮助实现了AV的现实采用和部署。 AVS实施的主要挑战之一是，即使不是可靠和有效地管理的道路上的交通拥堵可能会加剧交通拥堵。考虑到每辆车的整体效果并使用高效和可靠的技术可以真正帮助优化交通流量管理和减少拥堵。为此，我们提出了一个智能的交通管制系统，该系统处理在交叉路口和交叉点后面的复杂交通拥堵场景。我们提出了一个基于DRL的信号控制系统，该系统根据当前交叉点的当前拥塞状况动态调整交通信号。为了应对交叉路口后面的道路上的拥堵，我们使用重新穿线技术来加载道路网络上的车辆。为了实现拟议方法的实际好处，我们分解了数据筒仓，并将所有来自传感器，探测器，车辆和道路结合使用的数据结合起来，以实现可持续的结果。我们使用Sumo微型模拟器进行模拟。我们提出的方法的重要性从结果中体现出来。

translated by 谷歌翻译

Study of Feature Importance for Quantum Machine Learning Models

Aaron Baughman , Kavitha Yogaraj , Raja Hebbar , Sudeep Ghosh , Rukhsan Ul Haq , Yoshika Chhabra

分类：机器学习

2022-02-18

预测器重要性是经典和量子机学习（QML）数据预处理管道的关键部分。这项工作介绍了此类研究的第一个研究，其中探索了对QML模型的重要性与其经典的机器学习（CML）等效物进行了对比。我们开发了一种混合量子式体系结构，其中训练了QML模型，并根据现实世界数据集上的经典算法计算特征重要性值。该体系结构已在ESPN幻想足球数据上使用Qiskit StateSvector模拟器和IBM量子硬件（例如IBMQ Mumbai和IBMQ Montreal Systems）实现。即使我们处于嘈杂的中间量子量子（NISQ）时代，物理量子计算结果还是有希望的。为了促进当前量子标尺，我们创建了一个数据分层，模型聚合和新颖的验证方法。值得注意的是，与经典模型相比，量子模型的特征重要性具有更高的变化。我们可以证明等效QML和CML模型通过多样性测量是互补的。 QML和CML之间的多样性表明，两种方法都可以以不同的方式促进解决方案。在本文中，我们关注量子支持向量分类器（QSVC），变分量子电路（VQC）及其经典对应物。 ESPN和IBM幻想足球贸易助理将高级统计分析与沃森发现的自然语言处理相结合，以提供公平的个性化贸易建议。在这里，已经考虑了每个播放器的播放器评估数据，并且可以扩展此工作以计算其他QML模型（例如Quantum Boltzmann机器）的特征重要性。

translated by 谷歌翻译

Orientation Aware Weapons Detection In Visual Data : A Benchmark Dataset

Nazeef Ul Haq , Muhammad Moazam Fraz , Tufail Sajjad Shah Hashmi , Muhammad Shahzad

分类：计算机视觉

2021-12-04

自动检测武器对于改善个人的安全性和福祉是重要的，仍然是由于各种尺寸，武器形状和外观，这是一项艰巨的任务。查看点变化和遮挡也是使这项任务更加困难的原因。此外，目前的物体检测算法处理矩形区域，但是一个细长和长的步枪可以真正地覆盖区域的一部分区域，其余部分可能包含未经紧的细节。为了克服这些问题，我们提出了一种用于定向意识武器检测的CNN架构，其提供具有改进的武器检测性能的面向边界框。所提出的模型不仅通过将角度作为分类问题的角度分成8个类而且提供方向，而是作为回归问题。对于培训我们的武器检测模型，包括总6400件武器图像的新数据集从网上收集，然后用面向定向的边界框手动注释。我们的数据集不仅提供导向的边界框作为地面真相，还提供了水平边界框。我们还以多种现代对象探测器提供我们的数据集，用于在该领域进一步研究。所提出的模型在该数据集上进行评估，并且与搁板对象检测器的比较分析产生了卓越的拟议模型的性能，以标准评估策略测量。数据集和模型实现在此链接上公开可用：https://bit.ly/2tyzicf。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Posterior Collapse and Latent Variable Non-identifiability

Yixin Wang , David M. Blei , John P. Cunningham

分类： (统计)机器学习 | 机器学习

2023-01-02

Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译